Unified physiological model of audible-visible speech production
نویسندگان
چکیده
In this paper, vocal tract and orofacial motions are measured during speech production in order to demonstrate that vocal tract motion can be used to estimate its orofacial counterpart. The inversion, i.e. vocal tract behavior estimation from orofacial motion, is also possible, but to a smaller extent. The numerical results showed that vocal tract motion accounted for 96% of the total variance observed in the joint system, whereas orofacial motion accounted for 77%. This analysis is part of a wider study where a dynamical model is being developed to express vocal tract and orofacial motions as a function of muscle activity. This model, currently implemented through multilinear second order autoregressive techniques is described brie y. Finally, the strong direct in uence that vocal tract and facial motions have on the energy of the speech acoustics is exempli ed.
منابع مشابه
Low-Audible Speech Detection using Perceptual and Entropy Features
Low-audible speech detection is important since it conveys significant amount of speaker information and understanding. The performance of Automatic Speaker Recognition (ASR) and speaker identification systems drops considerably when low-audible speech is provided as input. In order to improve the performance of such systems, low-audible speech detection is essential. The production, acoustic a...
متن کاملAudiovisual Speech Recognition with Articulator Positions as Hidden Variables
Speech recognition, by both humans and machines, benefits from visual observation of the face, especially at low signal-to-noise ratios (SNRs). It has often been noticed, however, that the audible and visible correlates of a phoneme may be asynchronous; perhaps for this reason, automatic speech recognition structures that allow asynchrony between the audible phoneme and the visible viseme outpe...
متن کاملInfluenсe of Phone-Viseme Temporal Correlations on Audiovisual STT and TTS Performance
In this paper, we present a research of temporal correlations of audiovisual units in continuous Russian speech. The corpus-based study identifies natural time asynchronies between flows of audible and visible speech modalities partially caused by inertance of the articulation organs. Original methods for speech asynchrony modeling have been proposed and studied using bimodal ASR and TTS system...
متن کاملAudible Aspects of Speech Preparation
Noises made before the acoustic onset of speech are typically ignored, yet may reveal aspects of speech production planning and be relevant to discourse turn-taking. We quantify the nature and timing of such noises, using an experimental method designed to elicit naturalistic yet controlled speech initiation data. Speakers listened to speech input, then spoke when prompt material became visible...
متن کاملPerception of Synthesized Audible and Visible Speech
The research reported in this paper uses novel stimuli to study how speech perception is influenced by information presented to ear and eye. Auditory and visual sources ofinformation (syllables) were synthesized and presented in isolation or in factorial combination. A five-step contilllium between the syllables /bal and Idalwas synthesized along both auditory and visual dimensions, by varying ...
متن کامل